Skip to content

Merged Apache bug fixes#100

Merged
markhamstra merged 18 commits intoalteryx:csd-1.5from
markhamstra:csd-1.5
Oct 9, 2015
Merged

Merged Apache bug fixes#100
markhamstra merged 18 commits intoalteryx:csd-1.5from
markhamstra:csd-1.5

Conversation

@markhamstra
Copy link

No description provided.

Davies Liu and others added 15 commits September 28, 2015 14:40
The UTF8String may come from UnsafeRow, then underline buffer of it is not copied, so we should clone it in order to hold it in Stats.

cc yhuai

Author: Davies Liu <davies@databricks.com>

Closes apache#8929 from davies/pushdown_string.

(cherry picked from commit ea02e55)
Signed-off-by: Yin Huai <yhuai@databricks.com>
In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree.

The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that.

Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way.

The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#8919 from srowen/SPARK-10833.

(cherry picked from commit bf4199e)
Signed-off-by: Sean Owen <sowen@cloudera.com>
…AllocationSuite

Fix the following issues in StandaloneDynamicAllocationSuite:

1. It should not assume master and workers start in order
2. It should not assume master and workers get ready at once
3. It should not assume the application is already registered with master after creating SparkContext
4. It should not access Master.app and idToApp which are not thread safe

The changes includes:
* Use `eventually` to wait until master and workers are ready to fix 1 and 2
* Use `eventually`  to wait until the application is registered with master to fix 3
* Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4

Author: zsxwing <zsxwing@gmail.com>

Closes apache#8914 from zsxwing/fix-StandaloneDynamicAllocationSuite.

(cherry picked from commit dba95ea)
Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Ryan Williams <ryan.blake.williams@gmail.com>

Closes apache#8939 from ryan-williams/errmsg.

(cherry picked from commit b7ad54e)
Signed-off-by: Andrew Or <andrew@databricks.com>
…Suite

Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/

This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121).

There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#8946 from zsxwing/SPARK-10058.

(cherry picked from commit 9b3e776)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes apache#8961 from felixcheung/rselect.

(cherry picked from commit 721e8b5)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
I don't believe the API changed at all.

Author: Avrohom Katz <iambpentameter@gmail.com>

Closes apache#8957 from akatz/kcl-upgrade.

(cherry picked from commit 883bd8f)
Signed-off-by: Sean Owen <sowen@cloudera.com>
`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not.

Author: Wenchen Fan <cloud0fan@163.com>

Closes apache#8987 from cloud-fan/hash.
This should go into 1.5.2 also.

The issue is we were no longer adding the __app__.jar to the system classpath.

Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
Author: Tom Graves <tgraves@yahoo-inc.com>

Closes apache#8959 from tgravescs/SPARK-10901.

(cherry picked from commit e978360)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This PR implements the following features for both `master` and `branch-1.5`.
1. Display the failed output op count in the batch list
2. Display the failure reason of output op in the batch detail page

Screenshots:
<img width="1356" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10198387/5b2b97ec-67ce-11e5-81c2-f818b9d2f3ad.png">
<img width="1356" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10198388/5b76ac14-67ce-11e5-8c8b-de2683c5b485.png">

There are still two remaining problems in the UI.
1. If an output operation doesn't run any spark job, we cannot get the its duration since now it's the sum of all jobs' durations.
2. If an output operation doesn't run any spark job, we cannot get the description since it's the latest job's call site.

We need to add new `StreamingListenerEvent` about output operations to fix them. So I'd like to fix them only for `master` in another PR.

Author: zsxwing <zsxwing@gmail.com>

Closes apache#8950 from zsxwing/batch-failure.

(cherry picked from commit ffe6831)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Currently if it isn't set it scans `/lib/*` and adds every dir to the
classpath which makes the env too large and every command called
afterwords fails.

Author: Kevin Cox <kevincox@kevincox.ca>

Closes apache#8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.
The created decimal is wrong if using `Decimal(unscaled, precision, scale)` with unscaled > 1e18 and and precision > 18 and scale > 0.

This bug exists since the beginning.

Author: Davies Liu <davies@databricks.com>

Closes apache#9014 from davies/fix_decimal.

(cherry picked from commit 37526ac)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
…ifferent Oops size.

UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

To reproduce, launch Spark using

MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

And then run the following

scala> sql("select 1 xx").collect()

Author: Reynold Xin <rxin@databricks.com>

Closes apache#9030 from rxin/SPARK-10914.

(cherry picked from commit 84ea287)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…eaming applications

Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits:

462b264 [Hari Shreedharan] Improve log message.
2733d94 [Hari Shreedharan] Minor change to warning message.
eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled.
725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it.
b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled.
a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications.

(cherry picked from commit 0984129)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
BryanCutler and others added 3 commits October 8, 2015 22:23
…rain with given regParam and convergenceTol parameters

These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training.  Same with StreamingLinearRegressionWithSGD.  I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes apache#9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959.

(cherry picked from commit 5410747)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
…n on Aggregate

For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#8548 from cloud-fan/support-order-by-non-attribute.
@yeweizhang
Copy link

Can we also pull this fix?

https://issues.apache.org/jira/browse/SPARK-10389

This will fix the 100+ failure we ran into when comparing the native and sparkSQL resutls. Thank you.

@markhamstra
Copy link
Author

Already did.

markhamstra added a commit that referenced this pull request Oct 9, 2015
@markhamstra markhamstra merged commit ce28740 into alteryx:csd-1.5 Oct 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.